Introduction

Cavisson Systems’ NetHavoc is a unique implementation of core and advanced chaos engineering concepts that span the entirety of infrastructure, network and application layers. Conducting chaos experiments across these layers allows DevOps, SRE, QE and Development teams to accurately assess the system’s and application’s resilience. With an extensive integration with notification and ITSM tools, NetHavoc provides a ready enterprise level implementation of a highly matured chaos engineering platform.

NetHavoc vs Harness CE

NetHavoc offers a wide variety of chaos experiments across infrastructure, network and application levels. In-built integration with Cavisson’s cutting edge performance testing and observability solution makes it as simple as a matter of clicks to analyze the impact of chaos experiments with production level load on end user experience by capturing actual user sessions

and viewing the performance of your application across individual transaction(s) /page(s)

/session(s).

NetHavoc provides chaos experiments or havocs across a wider application landscape than Harness’s current support, which enables development teams to identify areas of improvement from a code level perspective in an increasingly diverse and evolving development framework ecosystem.

Detailed Comparison

The following section provides an in-depth comparison between NetHavoc and Harness CE. The differentiators are highlighted to easily identify features/components that set NetHavoc apart from Harness CE.

Supported Deployment Method(s)	NetHavoc	Harness CE
SaaS	Yes	Yes
On-Premise/Self-Managed	Yes	Yes
Air-Gapped	Yes	Yes

Supported Chaos Experiments	NetHavoc	Harness CE
CPU Burst Consumes CPU Cores or CPU utilization %	Yes	Yes
Disk Swindle Fills Up disks on the server	Yes	Yes
I/O Shoot Up Increase I/O activity on the devices	Yes	Yes
Memory Outlay Increases RAM utilization	Yes	Yes
Abort Application Aborts Application by Process Name or Process ID	Yes	Yes
Terminate Cloud Instance Kill instances of cloud machines	Yes	Yes
Kill Server Shut down or reboot the machine	Yes	Yes
Teleport Change system time either past or future change system time past or future	Yes	Yes
Kafka & JMS Distortion Impact Kafka Topics or JMS Queues with Message Influx	Yes	No
Intrude Network Packet corruption over interface	Yes	Yes
Trim Network Packets Induces packet loss over interface	Yes	Yes
Dormant Network Induces delay in network traffic	Yes	Yes
DNS Breakdown Rejects calls to DNS Server	Yes	Yes
Alter Inbound Services Induce delays & failures in service transactions.	Yes*	No(1)
Alter Outbound Services Induce delays & failures in callout to outbound services.	Yes*	No(2)
Method Invocation Delay method execution time.	Yes	No

* Microservice design patterns like Circuit Breaker and Bulkhead can be tested with these features.

[1]: Harness does not directly induce delay, instead, it uses an intermediate proxy server. Further, the delay is induced at the node/pod level. Delay on individual business transactions/services is not supported.

On the contrary, NetHavoc does not need a proxy, it injects delay directly inside the JVM. Also, the delay can be injected at a more granular level i.e. on individual business transactions/services.

[2]: Same as 1 above

Method Exception Generate exceptions in application methods.	Yes	No
Heap Memory Leak Increase JVM heap utilization.	Yes	No
Application CPU Burst Cause spike in CPU utilization via application process	Yes	Partially Supported in VMWare & Kubernetes
Thread Leak Create threads to analyze applications’ processing	Yes	No
Application Kill Terminate running application with customized error(s).	Yes	Yes

Additional Chaos Engineering Capabilities	NetHavoc	Harness CE
Monitor complete application/system for resiliency readiness	Yes	Partially Supported via Limited Probes
GameDays/Chaos Scenario Management	Yes	Yes
Abort chaos experiments	Yes	Yes
Schedule chaos experiments	Yes	Yes
Visual Experiment Builder	Yes	Yes
Execute chaos experiments in parallel	Yes	Yes
Integration with CI/CD Tools via REST API	Yes	Yes
Conduct chaos experiments with production level load	Yes	No
Native observability spanning logs & user sessions	Yes	No
Analyze webpage/transaction performance during & after experiments	Yes	No

Runtime Monitoring	NetHavoc	Harness CE
Built-in AIOps engine to assist in root cause analysis	Yes	No
Perform auto-remediation at infrastructure/app level	Yes	No
Analyze impact on end user experience	Yes	No
Perform diagnostic activities (Thread/Heap/TCP Dump)	Yes	No
Advance alerting algorithm to detect outliers, change etc..	Yes	No
Extensive log monitoring capabilities	Yes	No
Native Health Metrics	Yes	Only for Kubernetes
Create custom metrics	Yes	No
Single click comparison with multiple metrics	Yes	No
Out of the box Relational DB monitoring (Oracle, MySQL, MSSQL, PostgreSQL, etc.)	Yes	No
Out of the box NoSQL DB Monitoring (Cassandra, Redis, MongoDB, TSDB, Couchbase, Hadoop)	Yes	No

Analysis & Reports	NetHavoc	Harness CE
Customized reporting templates & scheduling options	Yes	No
In-built drill-down reports to analyze infra & app level impact	Yes	No
Integration with ITSM tools (ServiceNow, BMC Remedy)	Yes	No
Integration with wide array of communication tools (Slack, Teams, Spark, BigPanda).	Yes	Partially Supported

Administration, Security & Governance	NetHavoc	Harness CE
Comprehensive APIs	Yes	Yes
Built-in user management and authentication	Yes	Yes
Single Sign-On (LDAP, Okta)	Yes	Yes
Role-based Access Control	Yes	Yes
Full Audit Trails	Yes	Yes

Support	NetHavoc	Harness CE
SLA Guarantee	Yes	Yes
Training & Support	Yes	Yes
Online Community	Yes	Yes
Unified Experience Management Platform	Yes	No

Platform Specific Chaos Experiments Coverage

Cavisson’s NetHavoc provides extensive chaos experiment capabilities spanning application and infrastructure levels and with its support over multiple on-premise, cloud and containerized platforms, it offers a clear distinction over Harness. Let’s assess the chaos experiments supported over these aforementioned platforms:

GCP

Harness has extremely limited chaos experiment capabilities for GCP with disk loss and instance stop as the only chaos experiments supported. On the other hand, NetHavoc provides extensive infrastructure and application level chaos experiments or havocs on applications running on GCP.

Azure

Harness provides infrastructure level and a single application level chaos experiments on Azure. For applications, the only chaos experiment supported is to restrict access to an application instance. NetHavoc provides a wider variety of chaos experiments across the application level to accurately determine an application’s resilience in production level scenarios.

AWS

Harness provides a wider level of infrastructure and AWS service related chaos experiments but ends up lacking at the application level. Furthermore, Harness does not provide native observability for AWS. NetHavoc, apart from providing multiple application level chaos experiments, enables organizations with detailed, in-built monitoring of various AWS services which facilitates a 360-degree view of the distributed application ecosystem to better understand the extent of an experiment’s impact on both the application and infrastructure resiliency. Moreover, additional infrastructure/service level AWS chaos experiments are planned in the upcoming quarters as part of the product roadmap with the aim to provide an unmatched coverage for AWS via NetHavoc.

Kubernetes

Harness provides a larger number of chaos experiments for Kubernetes as compared to NetHavoc, but, at the infrastructure level. NetHavoc provides a more granular approach at the application level where users can inject havoc(s) at individual transaction/service. As with AWS, a wider range of observability metrics for Kubernetes is provided in NetHavoc when compared to Harness. Having this level of detailed insight is essential to understanding the impact of your resiliency testing initiatives on the application and its underlying components in a micro-service oriented application landscape. Container, node, pod and control plane level metrics are all covered under Cavisson’s native observability, thus giving organizations a comprehensive insight into each component’s preparedness during outages.

Pivotal Cloud Foundry/Tanzu Application Service

Harness has a single chaos experiment available for PCF/TAS whereas NetHavoc provides both system and application level chaos experiments along with in-built monitoring capabilities for applications deployed on Cloud Foundry. The monitoring module covers numerous integral cloud foundry services like Auctioneer, Nozzle, GoRouter, Controller, File Server amongst others to provide a holistic, all-round view of how your system & application responds to chaos experiments.

Linux

As observed with different platforms, Harness provides chaos faults only at the system level in Linux whereas NetHavoc’s chaos experiment capabilities covering both the application and infrastructure layers along with supporting experiments for Kafka and JMS based MQs.

Windows

NetHavoc provides resource and application level chaos experiments/havoc(s) for Windows in both VM and On-Premise format. Harness, on the other hand, does not support any chaos experiment for on premise windows OS based machines, and has resource level chaos experiments that are limited to Windows OS based VMWare VMs.

Due to this constraint, organizations with on premise Windows servers cannot utilize Harness and would require additional chaos experiment tools to carry out resiliency testing of their critical Windows based application(s)/infrastructure.

Conclusion

NetHavoc allows organizations and teams to conduct chaos experiments in conjunction with production-level traffic and extensive observability capabilities across applications, user sessions, logs, and infrastructure. Traditional methodologies of calculating resiliency scores with negligible observability insights and without appropriate user load falls way short of accurately depicting your mission critical application’s resiliency.

The above diagram illustrates how a unifying signal across various components (load, chaos experiments & observability) is fundamentally required to accurately drill down to the exact root cause behind issues being observed after conducting chaos experiments. Without this common signal, it becomes virtually impossible for traditional chaos engineering tools to gauge the extent and duration of KPI degradation without integrating multiple tools for application, user experience & log monitoring along with performance testing solutions.

Providing an extensive array of chaos experiment capabilities across the infrastructure and application layer becomes essential to accurately judge your IT ecosystem’s resiliency. Without this level of experiments spanning the entire spectrum, organizations cannot be prepared for outages seen in production as their resiliency preparedness remains limited.

Cavisson Systems’ NetHavoc elevates resiliency testing to resiliency engineering, assisting organizations and teams in realigning their focus on staying ahead of the competition instead of spending a massive amount of time figuring out the what, and why behind critical issues. Current tools are not adept at providing this level of insight and correlation, hence falling way short of actually ensuring that your mission critical applications are resilient enough to handle unplanned outages in production.

Contact us today to view NetHavoc’s cutting edge capabilities and elevate your end user experience by building resistance to failure.